Robust pitch tracking for prosodic modeling in telephone speech
نویسندگان
چکیده
In this paper, we introduce a pitch detection algorithm that is particularly robust for telephone speech and prosodic modeling. The algorithm uses a logarithmically sampled spectral representation of speech, similar to that in the subharmonic summation approach [2]. Constraints for log F0 and ∆ logF0 are combined in a dynamic programming search to find an optimum pitch track. The search algorithm is able to find a continuous pitch contour regardless of the voicing status, while a separate voicing decision module computes a probability of voicing per frame. We evaluated the algorithm using the Keele pitch extraction reference database [4] under both studio and telephone conditions. Our algorithm is very robust to channel degradation, and compares favorably to xwaves under telephone conditions. It also significantly outperforms xwaves when used for tone classification on a telephone quality Mandarin digit corpus.
منابع مشابه
The Prosody of Discourse Structure and Content in the Production of Persian EFL Learners
The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...
متن کاملModeling the prosody of hidden events for improved word recognition
We investigate a new approach for using speech prosody as a knowledge source for speech recognition. The idea is to penalize word hypotheses that are inconsistent with prosodic features such as duration and pitch. To model the interaction between words and prosody we modify the language model to represent hidden events such as sentence boundaries and various forms of disfluency, and combine wit...
متن کاملStatistical prosodic modeling: from corpus design to parameter estimation
The increasing availability of carefully designed and collected speech corpora opens up new possibilities for the statistical estimation of formal multivariate prosodic models. At Apple Computer, statistical prosodic modeling exploits the Victoria corpus, recently created to broadly support ongoing speech synthesis research and development. This corpus is composed of five constituent parts, eac...
متن کاملImproved generation of prosodic features in HMM-based Mandarin speech synthesis
The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...
متن کاملSymbolic and Direct Sequential Modeling of Prosody for Classification of Speaking-Style and Nativeness
In this paper, we explore the differences between direct and symbolic sequential modeling of prosody. We use sequential models to characterize speech in two tasks, classifying speaking-style and distinguishing native from non-native speech. We explore the use of a spike-and-slab model to directly model pitch contour data. We find in both of these tasks that sequences of symbolic prosodic events...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000